New word acquisition using subword modeling
نویسندگان
چکیده
In this paper, we use subword modeling to learn the pronunciations and spellings of new words. The subwords are generated with a context-free grammar, and are intermediate units between phonemes and syllables. We first evaluate the effectiveness of the subword model in automatically generating the spelling and pronunciation of new words. Then the subword model is embedded in a multi-stage recognizer which consists of word, subword, and letter recognizers. In a preliminary set of experiments, the hybrid system outperforms a large-vocabulary isolated word recognizer. The subword model is also used to improve the performance of the letter recognizer by generating a spelling cohort which is used to train a small letter n-gram. The small letter n-gram has a reduced perplexity compared to a much larger n-gram, and can be used by the letter recognizer for the spoken spelling mode. This could translate to an improved letter error rate in future letter recognition experiments.
منابع مشابه
Reversible Sound-to-Letter/Letter-to-Sound Modeling Based on Syllable Structure
This paper describes a new grapheme-tophoneme framework, based on a combination of formal linguistic and statistical methods. A context-free grammar is used to parse words into their underlying syllable structure, and a set of subword “spellneme” units encoding both phonemic and graphemic information can be automatically derived from the parsed words. A statistical -gram model can then be train...
متن کاملImproved Subword Modeling for WFST-Based Speech Recognition
Because in agglutinative languages the number of observed word forms is very high, subword units are often utilized in speech recognition. However, the proper use of subword units requires careful consideration of details such as silence modeling, position-dependent phones, and combination of the units. In this paper, we implement subword modeling in the Kaldi toolkit by creating modified lexic...
متن کاملThe use of subword linguistic modeling for multiple tasks in speech recognition
Over the past several years, I have been conducting research on subword modeling in speech recognition. The research is most specifically aimed at the difficult task of identifying and characterizing unknown words, although the proposed framework also has utility in other recognition tasks such as phonological and prosodic modeling. The approach exploits the linguistic substructure of words by ...
متن کاملData-driven pronunciation modeling for ASR using acoustic subword units
We describe a method to model pronunciation variation for ASR in a data-driven way, namely by use of automatically derived acoustic subword units. The inventory of units is designed so as to produce maximal separable pronunciation variants of words while at the same time only the most important variants for the particular application are trained. In doing so, the optimal number of variants per ...
متن کاملComparison of whole word and subword modeling techniques for speaker verification with limited training data
In this paper we use whole word and subword hidden Markov models for text dependent speaker veri cation. In this application usually only a small amount of training data is available for each model. In order to cope with this limitation we propose a intermediate functional representation of the training data allowing the robust initialization of the models. This new approach is tested with two ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007